Skewed Base Compositions, Asymmetric Transition Matrices, and Phylogenetic Invariants
نویسندگان
چکیده
Evolutionary inference methods that assume equal DNA base compositions and symmetric nucleotide substitution matrices, where these assumptions do not hold, are likely to group species on the basis of similar base compositions rather than true phylogenetic relationships. We propose an invariants-based method for dealing with this problem. An invariant QT of a tree T under a k-state Markov model, where a generalized time parameter is identified with the E edges of T, allows us to recognize whether data on N observed species can be associated with the N terminal vertices of T in the sense of having been generated on T rather than on any other tree with N terminals. The form of the generalized time parameter is a positive determinant matrix in some semigroup S of stochastic matrices. The invariance is with respect to the choice of the set of E matrices in S, one associated with each of the E edges of T. We apply a general "empirical" method of finding invariants of a parametrized functional form. It involves calculating the probability f of all KN data possibilities for each of m sets of E matrices in S to associate with the edges of T, then solving for the parameters using the m equations of form Q(f) = 0. We discuss the problems of finding asymmetric models satisfying the property of semigroup closure, of finding asymmetric models that admit invariants at all, and of the computational complexity of the method. We propose a class of semigroups Sc containing matrices of form [formula: see text] to account for A+T versus G+C asymmetries in DNA base composition. Quadratic invariants are obtained for rooted trees with three and with four terminals. In the latter case the smallest set of algebraically independent invariants is sought. These invariants are applied to data pertaining the fungal evolution and to the origin of mitochondria as bacterial endosymbionts.
منابع مشابه
Toric Ideals of Phylogenetic Invariants
Statistical models of evolution are algebraic varieties in the space of joint probability distributions on the leaf colorations of a phylogenetic tree. The phylogenetic invariants of a model are the polynomials which vanish on the variety. Several widely used models for biological sequences have transition matrices that can be diagonalized by means of the Fourier transform of an abelian group. ...
متن کاملPhylogenetic invariants for more general evolutionary models.
An invariant Q of a tree T under a k-state Markov model, where a generalized time parameter is identified with the E edges of T, allows us to recognize whether data on N observed species (usually, N DNA sequences, one from each species) can be associated with the N leaves of T in the sense of having been generated on T rather than on any other N-leaf tree. The form of the generalized time param...
متن کاملThe Strand Symmetric Model
Important special cases of strand symmetric Markov models are the groupbased phylogenetic models including the Jukes-Cantor model and the Kimura 2 and 3 parameter models. The general strand symmetric model or in this chapter just the strand symmetric model (SSM) has only these eight equalities of probabilities in the transition matrices and no further restriction on the transition probabilities...
متن کاملq-Cartan matrices and combinatorial invariants of derived categories for skewed-gentle algebras
Cartan matrices are of fundamental importance in representation theory. For algebras defined by quivers (i.e. directed graphs) with relations the computation of the entries of the Cartan matrix amounts to counting nonzero paths in the quivers, leading naturally to a combinatorial setting. In this paper we study a refined version, so-called q-Cartan matrices, where each nonzero path is weighted ...
متن کاملCounting phylogenetic invariants in some simple cases.
An informal degrees of freedom argument is used to count the number of phylogenetic invariants in cases where we have three or four species and can assume a Jukes-Cantor model of base substitution with or without a molecular clock. A number of simple cases are treated and in each the number of invariants can be found. Two new classes of invariants are found: non-phylogenetic cubic invariants te...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Journal of computational biology : a journal of computational molecular cell biology
دوره 1 1 شماره
صفحات -
تاریخ انتشار 1994